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Qs ' In many situations, data are recorded over a period of time and 

may be regarded as realizations of a stochastic process. In this pa- 
per, robust estimators for the principal components are considered 
P^ ' by adapting the projection pursuit approach to the functional data 

C^ , setting. Our approach combines robust projection-pursuit with dif- 

(-H ' ferent smoothing methods. Consistency of the estimators are shown 

under mild assumptions. The performance of the classical and ro- 
bust procedures are compared in a simulation study under different 
contamination schemes. 



C^ 



1. Introduction. Analogous to classical principal components analysis 
^ . (PCA), the projection-pursuit approach to robust PCA is based on finding 

f^ I projections of the data which have maximal dispersion. Instead of using 

the variance as a measure of dispersion, a robust scale estimator s„ is used 



O 



^SJ , for the maximization problem. This approach was introduced by Li and 

Chen (1985), who proposed estimators based on maximizing (or minimizing) 
(3 ■ ^ robust scale. In this way, given a sample Xj G M'^, 1 < i < n, the first robust 

CN ' principal component vector is defined as 



X 



arg max s.„ (a xi , . . . , a x,^ 

{aeM'* : aTa=l} 
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2 BALI, BOENTE, TYLER AND WANG 

The subsequent principal component vectors are obtained by imposing or- 
thogonality conditions. In the multivariate setting, Li and Chen (1985) ar- 
gue that the breakdown point for this projection-pursuit based procedure is 
the same as that of the scale estimator Sn- Later on, Croux and Ruiz-Gazen 
(2005) derived the influence functions of the resulting principal components, 
while their asymptotic distribution was studied in Cui, He and Ng (2003). 
A maximization algorithm for obtaining a was proposed in Croux and Ruiz- 
Gazen (1996) and adapted for high-dimensional data in Croux, Filzmoser 
and Oliveha (2007). 

The aim of this paper is to adapt the projection pursuit approach to the 
functional data setting. We focus on functional data that are recorded over 
a period of time and regarded as realizations of a stochastic process, often 
assumed to be in the L^ space on a real interval. Various choices of robust 
scales, including the median of the absolute deviation about the median 
(mad) and M-estimates of scale are considered and compared. 

Classical functional PGA uses the eigenvalues and eigenfunctions of the 
sample covariance operator. Dauxois, Pousse and Romain (1982) have stud- 
ied the asymptotic properties of these sample functional principal compo- 
nents. Rice and Silverman (1991) proposed to smooth the principal compo- 
nents by imposing an additive roughness penalty to the sample variance. The 
consistency of this method was subsequently studied by Pezzulli and Silver- 
man (1993). Another approach to smoothing the principal components, pro- 
posed in Silverman (1996) and reviewed in Ramsay and Silverman (2005), is 
based on penalizing the norm rather than the sample variance, while Boente 
and Fraiman (2000) considered a kernel-based approach. More recent work 
on estimation of the principal components and the covariance function in- 
cludes Gervini (2006), Hall and Hosseini-Nasab (2006), Hah, Miiller and 
Wang (2006) and Yao and Lee (2006). 

The literature on robust principal components in the functional data set- 
ting, though, is rather sparse. To our knowledge, the first attempt to provide 
estimators of the principal components that are less sensitive to anomalous 
observations was due to Locantore et al. (1999), although their approach is 
multivariate in nature. Gervini (2008) studied a fully functional approach to 
robust estimation of the principal components by considering a functional 
version of the spherical principal components defined in Locantore et al. 
(1999). Hyndman and Shahid Ullah (2007) give an application of a robust 
projection-pursuit approach, applied to smoothed trajectories, but did not 
study the properties of their method in detail. 

In this paper, we introduce several robust estimators of the principal 
components in the functional data setting. Our approach uses a robust 
projection-pursuit combined with various smoothing methods. A primary 
focus of this paper is to provide a rigorous theoretical foundation for this 
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approach to robust functional PCA. In particular, we establish under very 
general conditions the strong consistency of the our proposed estimators. 

In Section 3, the robust estimators of the principal components, based on 
both the raw and smoothed approaches, are introduced. Consistency results 
and the asymptotic robustness of the procedure are established in Section 4, 
while Fisher-consistency of the related functionals is studied in Section 5. 
Section 6 provides conditions under which one of the crucial assumptions 
hold. Selection of the smoothing parameters for the smooth principal com- 
ponents is discussed in Section 7. The results of a Monte Carlo study are 
reported in Section 8. Finally, Section 9 contains some concluding remarks. 
Most proofs are relegated to the Appendix and to the technical supplemen- 
tary material available online; see Bali et al. (2011a). We begin the next 
section with notation and a review of some basic concepts which are utilized 
in later sections. 

2. Preliminaries. 

2.1. Functional principal components analysis. Principal components ana- 
lysis, which was originally developed for multivariate data, has been success- 
fully extended to accommodate functional data, and is usually referred to 
as functional PCA. Principal components analysis for general Hilbert spaces 
can be described as follows. 

Let X G "H be a random element of a Hilbert space "H defined in (Jl, A, P). 
Denote by (•, •) the inner product in T-L and by ||a|p = (a, a). Assume that X 
has finite second moment, that is, EdlXp) < oo. The bilinear operator 
ax '-T-L X H ^ M. defined as ax{a,/3) = cov{{a,X), {I3,X)) leads to a con- 
tinuous operator. The Riesz representation theorem then implies that there 
exists a bounded operator, Tx '■'H^-'H, such that ax{a,f3) = {a,Tx(3). The 
operator Tx is called the covariance operator of X and is linear, self-adjoint 
and continuous. 

Although the general situation in which X G 'H is treated in this paper, 
to help simplify the basic ideas, we first consider the case X S L^(I) where 
X C M is a finite interval. We take the usual inner product for L'^(Z), that is, 
(a, /?) = J-j-a{t)/3(t) dt and denote the covariance function of X by jx {t, s) = 
cov{X(t),X{s)). The corresponding covariance operator Tx '■ L'^i^) -^ -^^(^) 
is such that Tx{a){t) = fxlxit, s)a{s) ds. It is assumed the covariance func- 
tion satisfies fx Ix'^xi^^ ^) dtds < oo. Consequently, Tx is a Hilbert-Schmidt 
operator. 

A Hilbert-Schmidt operator has a countable number of eigenvalues, all 
of which are real. J- will stand for the Hilbert space of such operators with 
inner product defined by (ri,r2)j^ = Yl'T=i{^i'^i-,^2Uj), where {uj : j > 1} 
is any orthonormal basis of L^(Z). Furthermore, since the covariance opera- 
tor Tx is also positive semi-definite, its eigenvalues are nonnegative. As with 
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symmetric matrices in finite-dimensional Euclidean spaces, one can choose 
the eigenfunctions of a Hilbert-Schmidt operator so that they form an or- 
thonormal basis for L^(Z). Let {(j)j :j > 1} and {Xj : j > 1} be respectively 
an orthonormal basis of eigenfunctions and their corresponding eigenvalues 
for the covariance operator Tx, with Xj > Aj+i. The spectral value decom- 
position for Tx can then be expressed as Tx = Yl'T=i'^j^j ® 4>j-, with 
being the tensor product, or equivalently 7x(i; s) = ^^"jLi '^j4'ji't)4>j{s), with 
"^^LiXJ = llFxIlj- = fj fj^xit, s) dt ds. The jth principal component vari- 
able is then defined as Zj = {<j)j,X), which leads to the Karhunen-Loeve 
expansion X{t) = fi{t) + YlTLi^j'Pji't), with fi{t) = E(X(t)) and the Zj's 
being uncorrelated and having variances Xj in descending order. 

In general, for Y = {a,X), which is a linear function of the process {X{s)}, 
we have var(y) = {a,Txoi). An important optimality property of the first 
principal component variable is that it can be defined as the variable Zi = 
{ai,X) such that 

(2.1) var(Zi) = sup var((a,X))= sup {a,Txa). 

{o:||a|| = l} {o:||o|| = l} 

Any solution to (2.1), that is, any a for which the supremum is obtained, 
corresponds to an eigenfunction associated with the largest eigenvalue of 
the covariance operator Tx, that is, ai = (pi and var(Zi) = Ai. If Ai > A2, 
then ai is unique up to a sign change. As in the multivariate setting, the 
other principal components can be obtained successively via (2.1), but under 
the orthogonality condition that {aj,ak) = for j < k. 

2.2. Scale Junctionals and estimates. The basic idea underlying our ap- 
proach is to view principal components as in (2.1), but with the variance 
replaced by a robust scale functional. We first recall the definition of a scale 
functional. Denote by G the set of all univariate distributions. A scale 
functional (Tr : t/ — > [0, -|-cx)) is one which is location invariant and scale 
equivariant, that is, if Ga,b stands for the distribution of aY + b when 
y ~ G, then, crii{Ga,b) = |a|<7R(G), for all real numbers a and b. Two well- 
known examples of scale functionals are the standard deviation, sd(G) = 
{E(y — E(y))^}"^' ^, where y ~ G, and the median absolute deviation about 
the median, mad(G) = cmedian(|y — median(y)|). The normalization con- 
stant c, used in the mad, can be chosen so that its empirical or sample 
version is consistent for a scale parameter of interest. Typically, one chooses 
c = 1/<1>~^(0.75) so that the mad equals the standard deviation at a normal 
distribution. 

The breakdown points, a measure of global robustness, for the standard 
deviation and the MAD are and 1/2, respectively. The mad, however, 
has a discontinuous influence function, which reflects some local instabil- 
ity. Furthermore, the empirical version of the MAD is known to be fairly 
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inefficient at the normal and other distributions; see Huber (1981). In the 
finite-dimensional setting, as reported in Cui, He and Ng (2003) the impact 
of a discontinuous influence function on the efficiency of the estimators of 
the principal directions is even more dramatic covariance function. 

A broader class of robust scale functionals, which includes as special cases 
both the SD and the mad, are the M-scale functionals. An M-scale func- 
tional with a bounded and continuous score function can have both a high 
breakdown point and a continuous and bounded influence function. Also, 
their empirical versions, the M-estimates of scale, can be tuned to have 
good efficiency over a broad range of distributions. Given a location pa- 
rameter fi, an M-scale functionals aM{G) with a continuous score function 
X : K ^^ M can be defined to be a solution to the equation 

Given a location statistic /2„, the corresponding M-estimate of scale is then 
a solution a„ to the M-estimating equation 

(2.3) -tx(^^^)=6. 

If the score function is discontinuous, as is the case with the mad, then 
a slight modification to (2.2) and (2.3) is needed; see Martin and Zamar 
(1993). 

Typically, the score function x is even with x(0) = 0) nondecreasing on M+ 
and with < supa,gjRx(a;) = x(+c«) = lima;^+ooX(2;)- When x(+oo) = 26, 
the M-estimate of scale has a 50% breakdown point, and by choosing x 
properly one can also obtain a highly efficient estimate; see Croux (1994). 
One such popular choice, and the one we use in our Monte Carlo study, is 
the score function introduced by Beaton and Tukey (1974), namely 

(2.4) xciy) = min(3(y/c)2 - 3{y/c)^ + (y/cf, 1) 

with c being a tuning constant chosen so that the corresponding M-estimator 
of scale is consistent for a scale parameter of interest. For example, the 
choice c = 1.56 when 6 = 1/2 ensures that the M-scale functional is Fisher- 
consistent at the normal distribution and has a 50% breakdown point. 

For continuous and nondecreasing score functions x^ the solutions to (2.2) 
and (2.3) are unique, and the simple re-weighting algorithm 






where w{y) = x{y)/y'^ for y ^ and w{{)) = x"(0)) is known to always con- 
verge to the unique solution of (2.3) regardless of the initial value an . In 
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practice, the initial value an is usually taken to be the mad. A discussion 
on the convergence of the algorithm can be found in Maronna, Martin and 
Yohai (2006). 

For a bounded score function x, if the solution o"r,(Go) of (2.2) is unique, as 
it is the case when x is continuous and nondecreasing, then the functional a^i 
is weakly continuous at Gq. Weakly continuity of itr at Go, that is, continuity 
with respect to the weak topology in Q which is given by the Prohorov 
metric, and consistency in a neighborhood of Gq entails robustness at Gq. 
For details, see Huber (1981) and Hampel (1971). 

3. The estimators. We consider several robust approaches in this sec- 
tion and define them on a separable Hilbert space H, keeping in mind that 
the main application will be 'H = L^(Z). From now on and throughout the 
paper, {Xi :! <i <n} denote realizations of the stochastic process X ~ P 
in a separable Hilbert space 7i. Thus, Xi ^ P are independent stochastic 
processes that follow the same law. This independence condition could be 
relaxed, since we only need the strong law of large numbers to hold in order 
to guarantee the results in this paper. 

3.1. Raw robust projection-pursuit approach. Based on property (2.1) of 
the first principal component and given o"r(F) a robust scale functional, 
the raw (meaning unsmoothed) robust functional principal component di- 
rections are defined as 

{4)K,i{P) = argmax<TR(P[a]), 
||a||=l 
(t>K,m{P)= argmax crR(P[a]), 2 < m, 

|a||=l,«eBm 

where P[a\ stands for the distribution of (q, X) when X ~ P and Bm = {a € 
1-L : (a, 0Rj(P)) = 0, 1 < J < m — 1}. We will denote the mth largest principal 
value by 

(3.2) Ar,^(P)=c7|(P[(/.r,„])= max 4(P[a]). 

\\a\\=l,aet3m 

Since the unit ball is weakly compact, the maximum above is attained if the 
scale functional cjr is (weakly) continuous. 

Next, denote by s'^:'H ^ M. the function s^(a) = (T^(P„[a]), where 
crR(P„[a]) stands for the functional cjr computed at the empirical distribu- 
tion of (a, Xi), . . . , {a, Xn)- Analogously, the mapping a:T-L ^M. stands for 
a{a) =o"R(P[a]). The components in (3.1) will be estimated empirically by 

: argmaxs„(a), 

(3-3) {^ IHM 

= argmaxs„(a), 2<m, 

CteBm 
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where Bm = {a GTi: \\a\\ = 1, {a, (J)raw,j) = 0, VI < j < m — 1}. The estima- 
tors of the principal values are then computed as 

(3.4) ARAW,m = S^((^RAW,m), 1 < m. 

3.2. Smoothed robust principal components. Sometimes instead of raw 
functional principal components, smoothed ones are of interest. The advan- 
tages of smoothed functional PCA are well documented; see, for instance, 
Rice and Silverman (1991) and Ramsay and Silverman (2005). One com- 
pelling argument is that smoothing is a regularization tool that might re- 
veal more interpretable and interesting features of the modes of variation for 
functional data. As noted in the Introduction, Rice and Silverman (1991) 
and Silverman (1996) proposed two smoothing approaches by penalizing the 
variance and the norm, respectively. 

To be more specific, Rice and Silverman (1991) estimate the first principal 
component by maximizing over ||a|| = 1, the objective function var((a, X)) — 
p\a, a] , where var stands for the sample variance and \a, (3~\ = Jg a"{t)f3"(t) dt. 
Consistency for these estimators was established by PezzuUi and Silverman 
(1993). 

Another regularization method proposed by Silverman (1996) is to penal- 
ize the roughness through a norm defined via the penalized inner product, 
{a,(3)r = {oi,/3) + rfa,/?]. The smoothed first direction (pi is then the one 
that maximizes vai{{a,X)) over ||a||T- = 1. Consistency of these estimators 
is also established in Silverman (1996) under the assumption that (pj has 
finite roughness, that is, \(pj,(pj~\ < oo. 

Clearly the smoothing parameters p and r need to converge to in order 
to get consistency results. 

Let us consider Tis, the subset of "smooth elements" oiJi. In order to ob- 
tain consistency results, we will assume that (puj (P) € Tis ■ Let D : ^^s -^Tihe 
a linear operator, which we will refer to as the "differentiator." Using D, we 
define the symmetric positive semidefinite bilinear form [•, •] : "Hg x ^s ~^ ^i 
where [a,/3] = {Da,DP). The "penalization operator" is then defined as 
^ :T-Ls ^ M, ^(a) = [«,«], and the penalized inner product as {a,f3)r = 
{a,/3) + r[a,/3]. Therefore, ||a||^ = ||a|p + T'^{a). As in Pezzulli and Silver- 
man (1993), we will assume that the bilinear form is closable. 

Remark 3.1. The most common setting for functional data is to choose 
n = L^(X), ns = {ae L'^{I),a is twice differentiable, and Jj-{a" {t))^ dt < 
oo}, Da = a" and \a,P] = f^ a" (t) f3" (t) dt so that ^^i^a) = f^{a"{t))'^ dt. 

Let (Tr(-F) be a robust scale functional and define Sn(a) and a{a) as 
in Section 3.1. Then we can adapt the classical procedure by defining the 
smoothed robust functional principal direction estimators either: 
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(a) by penalizing the norm as 

<^PN,i = argmaxs„(a) = argmax 



(3.5) 



2,„^ <(/?) 



1 /3^o (/3,/3) + r*(/3)' 

PN,m = argmax s„ (a), 2 < m, 



where Sm,r,PN = {a€7i: \\a\\r = 1, (a,^PNj)r = 0, VI <j<m-l}, 
(b) or by penahzing the scale as 

(pps,i = argmax{s^(a) - p^(a)}, 

||a||=l 

?PS,m = argjiiax{4(a) -/0^(a)}, 2<m, 

a&Bm.ps 



(3.6) 



where Bm,ps = {a € ^: ||a|| = 1, (a,(/>psj) = 0, VI < j < m — 1}. 

The corresponding principal value estimators are respectively defined as 

(3.7) Aps,m = Sn('/'PS,m) and ApN,m = S^((/)pN,m)- 

3.3. Sieve approach for robust functional principal components. Another 
approach, motivated by using S-splines as a smoothing tool, is to con- 
sider the method of sieves. The method of sieves involves approximating an 
infinite-dimensional parameter space by a sequence of finite-dimensional 
parameter spaces B^, which depend on the sample size n, and then estimate 
the parameters on the spaces 0^ rather than G. 

Let {Si}i>i be a basis of V. and define 'Hp„ as the linear space spanned 
hy Si,..., 6p„ and by Sp„ = {a G Tip^ : ||a|| = 1}, that is. 




and5p„ = {a G 7^ :a = ^^!1^ aj(5j, suchthat ||a|p = Y.%iYll'LiO-j'^s{^j,Ss) = 
1}. Note that 5p„ approximates the unit sphere S = {a ^H: \\a\\ = 1}. 
When {6i}i>i is an orthonormal basis, ||a|p = Yl^=i^'^ ~ ^^^ where a = 
(ai, . . . ,ap„)^, hence, the norm of a equals the Euclidean norm of the vec- 
tor a. Define the robust sieve estimators of the principal components as 



(3.8) 



^Si,i = argmaxs„(a), 

aeSp„ 

^si,m = arg max s„ (a) , 2 < m. 



where Bn,m = {a G 5p„ : (a,(/>si,j) = 0, VI < j < m — 1}, and let the principal 
value estimators be 

(3.9) Asi,m = Sn(?^SI,m)- 
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Some of the frequently used bases for functional data are the Fourier, poly- 
nomial, spline and wavelet bases; see, for instance, Ramsay and Silverman 
(2005). 

To the best of our knowledge, the above sieve approach is new to func- 
tional principal component analysis, even if one considers the classical sieve 
estimators, that is, when a^ in (3.8) is the standard deviation. 

3.4. A unified formulation for the robust projection pursuit approaches. 
To help formulate a unified approach to the different estimators considered 
in Sections 3.2, 3.2 and 3.3, let the products p'^{a) or t^(q) be defined 
as when p = or r = 0, respectively, even when a ^ Hs for which case 
^(a) = oo. Moreover, when pn = oo, define "Hp,^ = H. All the projection 
pursuit estimators considered in the previous subsections then can be viewed 
as special cases of the following general class of estimators: 

H = argmax {s„(a) — p^(a)}, 

(3.10) { ^ Pn N II 

'im = argmax{s„(a) — /9^(a)}, 2 < m, 

where l3m,T = {a £ Hp„ : \\a\\r = 1, {a,(l)j)r = 0,V1 < j < m — 1}. 

With this definition and by taking pn = oo, the raw estimators are ob- 
tained when p = T = 0, while i?i>pN,m and 4'ps,m correspond to p = and 
r = 0, respectively. On the other hand, the sieve estimators correspond a fi- 
nite choice for p„ and t = p = 0. 

4. Consistency results. In this section, we show that under mild condi- 
tions the functional (t)ji,m{P) and \K,m{P) defined through (3.1) and (3.2) 
are weakly continuous. Moreover, we state conditions that guarantee the 
consistency of the estimators defined in Section 3. Proofs for this section 
can be found in the Appendix and in the supplemental article [Bali et al. 
(2011a)]. 

To derive the consistency of the proposed estimators, we need the follow- 
ing assumptions: 

(50) For some q>2 and 1 < j < <?, (p'nj{P) are unique up to a sign change. 

(51) o":?^ — T-M is a weakly continuous function, that is, continuous with 
respect to the weak topology in H. 

(52) sup||„||=i|s2(a)-fj2(a)|^0. 

Note that condition (SO) holds if and only if Ar^i(P) > • • ■ > AR^q+i(P). 

Some remarks, (i) (SI) holds when the scale functional a^n is a continuous 
functional (with respect to the weak topology under the Prohorov distance). 
This is because a^ converges weakly to a, which implies {a^^X) — > {a,X) 
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and hence a^{P[ak]) — )■ (Tr(P[q]). For the case when the scale functional 
is the standard deviation, and the underlying probability P has a com- 
pact covariance operator Tx, we see from the relationship a'^(a) = (a, Txct) 
that condition (SI) holds, even though the standard deviation itself is not 
a weakly continuous functional. 

(ii) Since there exists a metric d generating the weak topology in H and 
that the closed ball Vr = {a: \\a\\ < r} is weakly compact, we see that (SI) 
implies that a{a) is uniformly continuous with respect to the metric d and 
hence, with respect to the weak topology, over Vr- 

(iii) Assumption (S2) holds for the classical estimators based on the sam- 
ple variance since the empirical covariance operator, T, is consistent in 
the unit ball. Indeed, as shown in Dauxois, Pousse and Romain (1982), 
||r — Tx\\ —-> 0, which entails that sup||Q||=]^|s^(a) — cr^(a)| < ||r — Tx\\ ——> 
0. However, this assumption may seem harder to verify for other scale func- 
tionals since the unit sphere S = {\\a\\ = 1} is not compact, and s'^{a) is 
usually not defined through a covariance operator estimator. To be more 
precise, even under some conditions to be considered in Section 5, there 
exists a compact operator T such that a{a) = (a, Fa), s^(a) cannot be ex- 
pressed as (a,F„a) for some consistent estimator F^ of F. Corollary 6.1 
in Section 6 establishes that (S2) holds for any scale functional cr that is 
continuous with respect to the weak topology. 

The following lemma, whose proof can be found in Section B of the tech- 
nical supplemental article given in Bali et al. (2011a), is useful for deriving 
the consistency and continuity of the principal direction estimators. In this 
lemma and in the subsequent theorems, it should be noted that (</), c/))^ -^ 1 
implies, under the same mode of convergence, that the sign of (f) can be 
chosen so that c/)^ (p- 

For the sake of simplicity, denote by Arj = Arj(P) and (/)rj = (^fj,j(P). 

Lemma 4.1. Let (pm ^S he such that {(t)m,(t>j) —-^ for j ^ m and as- 
sume that (SO) and (SI) hold. Then: 

(a) Ifa^{^i)^(7^{(l)K,i), then {^i,(I)r,i)^ ^1. 

(b) Given 2 <m < q, if a'^{(l)m) ^ cr'^{<pR,m) and (ps ^ <pR,s, for 1 < 
s<m-l, then {(pm,4>R,m)'^ -^1. 

Let dpR{P, Q) stand for the Prohorov distance between the probability 
measures P and Q. Thus, P„ — > P if and only if (ipR(P„,P) — )■ 0. Theo- 
rem 4.1 below establishes the continuity of the functionals defined in (3.1) 
and (3.2), and hence the asymptotic robustness of the estimators derived 
from them, as defined in Hampel (1971). This can be seen just by replacing 
almost sure convergence by convergence in its statement. As it will be shown 
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in Section 6, the uniforni convergence required in assumption (ii) below is 
satisfied, for instance, if cjr is a continuous scale functional when P„ — > P. 
To accommodate data driven smoothing parameters a more general frame- 
work is considered in Theorem 4.1, which allows for the smoothing param- 
eters Tn and pn to be random, such that t„ ^-4 and p„ ^-4- 0. 

Theorem 4.1. Let Pn be a sequence of probability measures and t = 
Tn > 0, p = Pn > be random smoothing parameters. Denote by o"^(a) = 
(Tp^(Pn[a]) and define Am = 0"^(</'m) with 



argmax{(T„(a) — p"^{a)}, 



(4.1) 



=1 



argmax{(T„(a) — p^(a)}, 2 < m. 



where Bm,T = {a Gl-L: \\a\\r = l,(a,(^j)r = 0,V1 < j <m — 1}. Let P be 
a probability measure satisfying (SO). Assume that: 

(i) (SI) holds; 
(ii) sup||„||=i|a2(Q) -a|(P[Q])| ^0; 

(iii) Tn ^-4- and pn ——>■ 0; 

(iv) moreover, if Tn> or pn > 0, for all n > uq, assume that ^rj € 'Hs? 
that is, ^{(j)Rj) < oo, for all 1 < j <q. 

Then: 

(a) Ai ^-4 Ar^i and (t'^{4'i) ^-4 (T^((/)r^i). Moreover, p"^{4'i) ^-4 and 
t\(J)i,(J)i'] ——> 0, and so \\(j)i\\ ——> I; 

(b) (^i,,^R,i)2^1; 

(c) for any2<m<q, if cpi -^ cp^^i, T'^icpe) -^ and p^(0^) -^ for 
l<i<m-l, then Am -^ cr'^i4'R,m) and a'^Q'm) ^ cr'^ i4>R,m) ■ Moreover, 
p'^((t)m) -^0, T^(^.m)J^O and so, ll^mll ^ 1; 

(d) forl<m<q, ((^m,0R,m)^ -^ 1- 

Note that assumption (ii) corresponds to (S2) when P„ is the empirical 
probability measure. On the other hand, when (Tr,(-) is a continuous scale 
functional. Theorem 6.2 implies that (ii) holds whenever dpR(P„,P) ^^0. 
Moreover, if <tr(-) is a continuous scale functional and P satisfies (SO), The- 
orem 4.1 entails the continuity of the functional 4'R,ji') a-i^d Api,j(-) at P, 
for I < j < q, and so the proposed estimators are qualitatively robust and 
consistent. In particular, the estimators are robust at any elliptical distribu- 
tion £{p,T), as defined in Section 5, such that the largest q+l eigenvalues 
of the operator T are all distinct. 
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Theorem 4.1 establishes the consistency of the raw estimators of the prin- 
cipal components under (SO) to (S2) by taking p = t = 0. It also shows that 
proposals (3.5) and (3.6) give consistent estimators if cp^j € Tis, ^<j<q- 
In Bali et al. (2010), it is shown that the estimators (j)pN,m and ApN,m de- 
fined in (3.5) and (3.7) are still consistent if ^rj € Hs, I < j <q, where Tis 
stands for the closure of Tis- The condition cp^ij G ^s generalizes the as- 
sumption of smoothness, 0rj € 'Hsi required in Silverman (1996) and holds, 
for example, when T-Ls is a dense subset of H. 

Theorem 4.2 establishes the consistency of the estimators of the principal 
directions defined through the sieve approach given in (3.8). Below we give 
a separate statement for the consistency of the sieve estimators to avoid im- 
posing additional burdensome assumptions, such as smoothness conditions 
for the basis elements, whenever either r ^ or p 7^ in (3.10). Its proof is 
relegated to the Section C of the technical supplement [Bali et al. (2011a)]. 

Theorem 4.2. Let 0si,m cL^d Xsi,m be the estimators defined in (3.8) 
and (3.9), respectively. Under (SO) to (S2), if pn—^ 00, then: 

(a) Asi,i -^it2(0r,i) and o-2(^si,i) -^o-^(<^R,i)- 

(b) Given 2 <m <q, if (psi/ ——^ 4'R,i, for 1 < i <m — 1, then Xsi,m ——^ 
cr'^{4'K,m) and cr'^ i(j)si,m)J^ cr'^ {(pR,rn) ■ 

(c) For l<m<q, (0si,m,0R,m)^ ^ 1- 

5. Fisher-consistency under elliptical distributions. The results in Sec- 
tion 4 ensure that, under mild conditions, the estimates of the principal 
directions converge almost surely to (pR^m defined in (3.1). An important 
point to highlight is what the functions (pR^m represent, at least in some par- 
ticular situations. This section focuses on showing that, for the functional 
elliptical families defined in Bali and Boente (2009), the functionals <^R,m(-P) 
and AR^m(-P) have a simple interpretation. In particular, our results hold for 
the functional elliptical family, but are not restricted to it. We recall here 
their definition for the sake of completeness. 

Let X be a random element in a separable Hilbert space Ti and /i G "H. Let 
T ■.'H^'H he a self-adjoint, positive semidefinite and compact operator. We 
will say that X has an elliptical distribution with parameters (/U, F), denoted 
as X ~ £{fJ-, F), if for any linear and bounded operator AiT-L ^- M , AX has 
a multivariate elliptical distribution with parameters Afj, and 74FA*, that 
is, AX ~ £d{Afi,ATA*), where A* -.MP ^ "H stands for the adjoint operator 
of A. As in the finite-dimensional setting, if the covariance operator, Tx, 
of X exists, then Tx = oF, for some a G R. 

The elliptical distributions in 7i include the Gaussian distributions. Other 
elliptical distributions can be obtained from the following construction. Let Vi 
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be a Gaussian element in % with zero mean and covariance operator Vy^ , 
and let Z he a random variable independent of Vi. Given fj, (^7i, define 
X = fi -\- ZVi. Then, X has an elliptical distribution £{fi,T) with the op- 
erator r being proportional to Ty^- Note that T exist even if the second 
moment of X do not exist. For random elements which admit a finite 
Karhunen-Loeve expansion, that is, X{t) = fi{t) + Yl'j=i^j Uj(pj{t), the 
assumption that X has an elliptical distribution is analogous to assuming 
that U = (C/i, . . . , Uq)"^ has a spherical distribution. 

Lemma 5.1 below states the Fisher-consistency of the functionals defined 
through (3.1) under the following assumption: 

(S3) There exists a constant c > and a self-adjoint, positive semidefi- 
nite and compact operator Fq, such that for any a gH, we have a'^{a) = 
c{a,TQa). 

Its proof follows immediately and is thus omitted. Note that (S3) entails 
that the function a:T-L^>-M. defined as a{a) = aii{P[a]) is weakly continu- 
ous, hence (SI) holds. Besides, as a consequence of Lemma 5.1, (SO) holds 
under (S3) if the q largest eigenvalues of Fq are distinct. 

Lemma 5.1. Let 0R,m <ind Ar^^ be the functionals defined in (3.1) 
and (3.2), respectively. Let X ^ P be a random element such that (S3) 
holds. Denote by Xi > X2 > ■ ■ ■ the eigenvalues of Fq and by (j)j the eigen- 
function of Fq associated to Xj. Assume that for some q>2, and for all 
^ < 3 <q, Ai > A2 > • • • > Ag > Ag+i . Then, we have that 4>rj{P) = cpj and 
Ar,,(P) = cA,. 

For any distribution possessing finite second moments, if the scale func- 
tional is taken to be the standard deviation, then (S3) holds with Fq = Fx- 
When considering a robust scale functional, (S3) holds if X has an ellipti- 
cal distribution £{fj,,T) taking Fq = F, and so Lemma 5.1 entails that the 
functionals 0rj(P) defined through (3.1) are Fisher-consistent. As in the 
finite-dimensional setting, the scale functional itr can be calibrated to at- 
tain Fisher-consistency of the principal values. 

Assumption (S3) ensures that we are estimating the target directions. 
It may seem restrictive since it is difficult to verify outside the family of 
elliptical distributions except when the scale is taken to be the standard 
deviation. However, even in the finite-dimensional case, asymptotic prop- 
erties have been derived only under similar restrictions when considering 
projection-pursuit estimators. For instance, both Li and Chen (1985) and 
Croux and Ruiz-Gazen (2005) assume an underlying elliptical distribution 
in order to obtain consistency results and influence functions, respectively. 
Also, in Cui, He and Ng (2003) the influence function of the projected data 
is assumed to be of the form /i(x,a) = 2(7(F[a])IF(x,(Ta;i*b); where -F[a] 
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stands for the distribution of ax when ^ --^ F. This condition, though, 
primarily holds when the distribution is elliptical. 

Remark 5.1. An alternative to the robust projection pursuit approach 
for functional principal components is to consider the spectral value decom- 
position of a robust covariance or scatter operator. The spherical principal 
components, noted in the Introduction, which were proposed by Locantore 
et al. (1999) and further developed by Gervini (2008), apply this approach 
using the spatial covariance operator. The spatial covariance operator is 
defined to be 

V = E((X -7])0{X- 7])/\\X - 7]f) 

with 7] being the spatial median, that is, 

(5.1) r/ = argminE(||X-6'||- ||X||). 

een 

The spatial median is sometimes referred to as the multivariate L^ median, 
but this is a misnomer since the the norm in (5.1) is the L^ norm. Note that 
when the norm is replaced by the square of the norm in (5.1), the resulting 
parameter is the mean. 

Gervini (2008) proved the Fisher-consistency of the eigenfunctions of the 
spatial covariance operator, but under the additional assumption that X 
has a finite Karhunen-Loeve expansion so that V has only a finite number 
of nonzero eigenvalues, which is essentially the multivariate setting. Un- 
like the projection pursuit approach, though, under an elliptical model the 
eigenvalues of V are not proportional to the eigenvalues of the shape param- 
eter r. Consequently, as discussed, for example, in Harden (1999), Boente 
and Fraiman (1999) and Visuri, Koivunen and Oja (2000), this implies that 
even if the second moments exist, the amount of variance explained by a prin- 
cipal component variable is not equivalent to the ratio of the eigenvalue to 
the sum of all the eigenvalues. That is, Afc/^°^^Aj is not the same as 

the explained proportion ^k/Y17Li -^j; where Afc and Xk are the kth. largest 
eigenvalue of V and T respectively. 

In the multivariate setting, it is also known that the eigenvectors obtained 
from the sample spatial covariance matrix can be extremely inefficient esti- 
mates whenever the eigenvalues differ greatly; see Croux (1999). Intuitively, 
the reason for this inefficiency is that the spatial covariance matrix down- 
weights observations according to their Euclidean distance from the center. 
This seems reasonable when the distribution is close to being spherical, but 
less so when there are strong dependencies in the variables. In some sense, 
this is the antithesis of PGA, since one is usually interested in principal 
components when one suspects the latter. 
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As noted in Maronna, Martin and Yohai (2006), there is a vast liter- 
ature on robust estimates for covariance matrices, such as M-estimates, 
(S-estimates and the MCD, among others. These estimates downweight ob- 
servations relative to the shape of the data cloud. It may seem reasonable 
then to try to extend these estimates to the functional setting. An important 
feature of these estimates, though, is that they are affine equivariant, and as 
shown in Tyler (2010), this implies that, when the sample size is no greater 
than the dimension plus one, such estimates are simply proportional to the 
sample covariance matrix. In the functional data setting, the sample size is 
always less than the dimension, which is infinite. Thus, at this time, we view 
the robust projection-pursuit approach as more viable. 

6. Some uniform convergence results. In this section, we show that when 
the scale functional is continuous with respect to the Prohorov distance, (S2) 
and more generally, condition (ii) in Theorem 4.1 hold whenever P„ — > P. 
To derive these results, we will first state some properties regarding the 
weak convergence of empirical measures that hold not only in L^ (I) but in 
any complete and separable metric space. These properties may be useful in 
other settings. The proofs for the theorems in this section are relegated to 
Section D of the supplemental article [Bali et al. (2011a)]. 

Let A^ be a complete and separable metric space (Polish space) and B 
the Borel u-algebra of A4. Lemma 6.1, which is a restatement of Theorem 3 
in Varadarajan (1958), ensures that the empirical measures converge weakly 
almost surely on a Polish space to the probability measure generating the 
observations. 

Lemma 6.1. Let {Q,A,'F) be a probability space and Xn -^ —^ M, n G N, 
be a sequence of independent and identically distributed random elements 
such that Xi^ P. Assume that A4 is a Polish space, and denote by Pn the 
the empirical probability measure, that is, Pn[A) = (1/n) ^"^^ /^(^i) with 
lA_{Xi) = 1 if Xi € A and elsewhere. Then, Pn — > P almost surely, that 

is,dpR{Pn,P)^0. 

Let P be a probability measure in A^, a separable Banach space, and 
let A4* denote the dual space. For a given / G Ai* , define P[f] as the real 
measure of the random variable f{X), with X ~ P. 

Theorem 6.1. Let {Pn}neN o-nd P be probability measures defined on Ai 
such that dpR{Pn,P)^0. Then, sup||j||^=i (ipR(P„[/],P[/]) ->0. 

When the Banach space A4 above is a separable Hilbert space Ti, the Riesz 
representation theorem implies that for / G Ti* with ||/||* = 1, there exists 
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a gT-L such that f{X) = {a,X). The following result states that when (Tr is 
a continuous scale functional, uniform convergence can be attained and so, 
assumption (ii) in Theorem 4.1 is satisfied. 

Theorem 6.2. Let {Pn}neN o-iT-d P he probability measures defined on 
a separable Hilbert space %, such that (ipR(P„,P) — t- 0. Let ur be a contin- 
uous scale functional. Then, supiuii^]^ [^^(^^[a]) — crR(P[a])| — ^-O. 

Using Lemma 6.1 and Theorem 6.2, we get the following result that shows 
that (S2) holds if ur is a continuous scale functional. 

Corollary 6.1. Let P be a probability measure in a separable Hilbert 
space %, Pn he the empirical measure of a random sample Xi, . . . ,X„ with 
Xi ~ P, and (Tr he a continuous scale functional. Then we have that 
sup||„ll=i|(TR(Pn[a]) -o-r(P[q])| -^0. 

7. Selection of the smoothing parameters. The selection of the smooth- 
ing parameters is an important practical issue. The most popular general 
approach to address such a selection problem is to use the cross-validation 
methods. In nonparametric regression, the sensitivity of L^ cross-validation 
methods to outliers has been pointed out by Wang and Scott (1994) and 
by Cantoni and Ronchetti (2001), among others. The latter also proposed 
more robust alternatives to L^ cross-validation. The idea of robust cross- 
validation can be adapted to the present situation. Assume for the moment 
that we are interested in a fixed number, £, of components. We propose to 
proceed as follows: 

(CVl) Center the data. That is, define Xi = Xi —fl where /t is a robust 
location estimator, such as the functional spatial median defined in Gervini 
(2008). 

(CV2) For the penalized roughness approaches and for each m in the 
range l<m<l and r > 0, let (i))n,T denote the robust estimator of the ?Tith 
principal direction computed without the jth observation. 

(CVS) Define X/(t) = Xj - 7r{Xj;3f^^), for 1 < j < n, where Tr{X;C) 
stands for the orthogonal projection of X onto the linear (closed) space C, 

and C\ stands for the linear space spanned by (j)\^ ,. . . ,<j)l^ . 

(CV4) Let RCYeir) = (7li\\X^{T)l . . . ,\\X^{t)\\), where o-„ is a robust 
measure of scale about zero. We then choose r^ to be the value of r which 
minimizes RCVf(T). 

By a robust measure of scale about zero, we mean that no location es- 
timator is applied to center the data. For instance, in the classical setting, 
one takes cr'^{zi, . . . , z„) = (1/n) "^^^i zf, while in the robust situation, one 
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might choose cr„(zi, . . . , Zn) = median(|zi |, . . . , |z„|) or to be an M-estimator 
satisfying equation (2.3) when setting /2„ = 0. 

For large sample sizes, it is well understood that cross-validation methods 
can be computationally prohibitive. In such cases, K-iold cross-validation 
provides a useful alternative. In the following, we briefly describe a robust 
K-fo\d cross-validation procedure suitable for our proposed estimates. 

(Kl) First center the data as above, using Xi= Xi — Jl. 
(K2) Partition the centered data set {Xi} randomly into K disjoint sub- 
sets of approximately equal sizes with the jth subset having size rij >2, 

Z]i=i"'j = ^- Let {^j }i<i<nj be the elements of the jth subset, and 
{X| }i<i<n-nj denote the elements in the complement of the jth sub- 
set. The set {X^ }i<i<n-nj will be the training set and {X^ }i<j<„j. the 
validation set. 

(K3) Similar to step (CV2) but with leaving the jth validation subset 
{Xj }i<j<„^. out instead of the jth observation. 

(K4) Define X- (r) the same way as in step (CVS), using the validation 
set. For instance, ^^^-^(t) = X^^ - 7r(Xp^;4~^'^), 1 < « < rij, where c\~^^ 
stands for the linear space spanned by <j)\^ ,. . . ,(j)\^ . 

(K5) Let RCV,,kcv(t) = Ef=i<^n(II^P^(^)ll, ■ • ■ , ll^if ^(t)|!), and choo- 
se r„ to be the value of r which minimizes RCV^^kcv (''")• 

A similar approach can be developed to choose p.„ for the sieve estimators. 

8. Monte Carlo study. The results of Section 4 established under gen- 
eral conditions the consistency of the various robust projection pursuit ap- 
proaches to functional principal components analysis. The classical approach 
to functional principal components analysis also yields consistent estimates, 
provided second moment exists. A study of the influence functions and the 
asymptotic distributions of the various procedures would be useful to com- 
pare them. We leave these important and challenging theoretical problems 
for future research. For now, to help illuminate possible differences in the 
various approaches, we present in this section the results of a Monte Carlo 
study. 

8.1. Algorithms. All the computational methods to be considered here 
are modifications of the basic CR algorithm proposed by Croux and Ruiz- 
Gazen (1996) for the computation of principal components using projection- 
pursuit. The basic algorithm applies to multivariate data, say TTi-dimensional, 
and requires a search over projections in W^ . The grid algorithm described 
in Croux, Filzmoser and Oliveira (2007) can also be considered, in partic- 
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ular, when the number of variables m is larger than the sample size n. For 
the sake of completeness, we briefly recall the CR algorithm. 

Let Y = (yi, . . . ,y„) be the sample in R'", and let /in(Y) be a location 
estimate computed from this sample. Let 1 <q <m be the desired number 
of components to be computed and denote by ^„ the univariate projection 
index to be maximized. In the multivariate setting, the index S,n corresponds 
to a robust univariate scale statistic. 

(CRl) For k = l,sety\ —Yi — P-nO^)- Let the set of candidate directions 
for the first principal direction be ^„^i(Y) = {y^^ /fj, 1 < i < n} where I'f = 

Yi yl • We then define vi = argmaXa£^^^^^(Y) ^n(a'^yi, • • • , a'^yn). 

(CR2) For 2 < k < q, define recursively z| = v^_j^yj and y^^ = 

y ■ ~ - 4 ~ -"vfe-i = y| - vrvfe_i(y| ), where vrv^.^ly) stands for the or- 
thogonal projection of y over the linear space Vk-i spanned by vi, . . . , Vfc_i. 
Let the set of candidate directions for the kth principal direction be 

-4n,fc(Y) = {y\ '/fi,! <i<n} where uf = yf y| \ and define v^ = 
argmaxag^^^ ^.(Y) ^nla'^yi, . . . , a'^y™). 

The vectors v^ then yield approximations to the kth principal direction, for 
1 < A; < g, and approximate scores for the kth. principal variable are given by 
z| ' = v^yj, for 1 <i <n. As mentioned in Croux and Ruiz-Gazen (1996), 
the CR algorithm makes no smoothness assumptions on the index ^„, is 
simple and fast, and requires only 0{n) computing space. 

To apply the algorithm to functional data when considering either the 
raw or a penalized approach, we first discretize the domain of the ob- 
served function over m equally spaced points in 1= [—1,1]. The result- 
ing multivariate observations are then y, = {Xi{ti), . . . , Xi{tm)) , where 
to = —1 < ti < ■ ■ ■ < tm < tm+i = 1- The index ^„ in the algorithm depends 
on the approach being used. For instance, for the raw robust estimate and 
for those penalizing the norm the index is a robust scale. On the other hand, 
for the robust penalized scale approach the index is the square of the ro- 
bust scale plus the penalization term. Also, for the penalized norm approach 
the orthogonal projection 7rvj._i(y) in step (CR2) is with respect to the in- 
ner product (■, ■)r so, the finite-dimensional inner product is modified as in 
Silverman (1996). The resulting directions v^ then give numerical approxi- 
mations for {(/)fc(ti), . . . , (pkitm)}- One can then interpolate or use smoothing 
methods to obtain (f>kit) for t^X. 

For the sieve approach, let (5i,...,(5p„ be an orthonormal basis for "Hp^, 
which can be obtained by using Gramm-Schmidt on the original basis. For 
a € T-Lp^^ we then have {a,Xi) = SL^yi, where a = X^^liOj'Jj, a = (oi, . . . , 

'^Pn)'^) y« = ii^ii^i)^ • • • ) (^4 5 ^p„))^- Consequently, we can take m=pn and 
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apply the CR algorithm to the inner scores yj. The inner scores are com- 
puted numerically by approximating the integrals over a grid of 50 points. 
A numerical approximation for (j)k{t) is then given by Yl^=i'^k,jSj{'t) with 

8.2. The estimators. There are three main characteristics that distin- 
guish the different estimators: the method of centering in the first step of 
the CR algorithm, the scale function being used and the type of smoothing 
method. 

Centering: For classical procedures, that is, those based on the standard 
deviation, we use the sample mean as the centering point for the trajectories. 
For the robust procedures, that is, those based on mad or M-SCALE, we 
center the data by using either (i) the component-wise sample median, that 
is, the median at each time point, or (ii) the sample spatial median; see (5.1). 
It turns out that the two robust centering methods produced similar results, 
so only the results for the spatial median are reported. 

Scale function: Three scale estimators are considered here: the classi- 
cal standard deviation (sd), the median absolute deviation (mad) and an 
M-estimator of scale (M-SCALe). For the M-estimator, we use the score 
function (2.4) introduced by Beaton and Tukey (1974), as discussed in Sec- 
tion 2.2, with c = 1.56, 6 = 1/2. 

Smoothing parameters r and p: For both the classical and robust proce- 
dures defined in Section 3.2, a penalization depending on the L^ norm of 
the second derivative, multiplied by a smoothing factor, is included, that is, 
'J'(a) = j_^{a"{t)Y dt. Again the integral is computed over the same grid 
of points ti,...,fm5 and the second derivative of a at tj is approximated 
by {a(tj_|_i) — 2a{ti) + a(tj_i)}/(ij_|_i — tj)^, since we choose an equidistant 
grid of points. Note that when p = r = 0, the raw estimators defined in 
Section 3.1 are obtained. 

Sieve: Two different sieve basis are considered: the Fourier basis obtained 
by taking 5j to be the Fourier basis functions, and the cubic S-spline basis 
functions. The Fourier basis used in the sieve method is the same basis used 
to generate the data. 

In all tables, the estimators corresponding to each scale choice are labeled 
as SD, MAD, M-SCALE. For each scale, we consider four estimators, the raw 
estimators where no smoothing is used, the estimators obtained by penaliz- 
ing the scale function defined in (3.6), those obtained by penalizing the norm 
defined in (3.5) and the sieve estimators defined in (3.8). In all tables, as in 
Section 3, the jth principal direction estimators related to each method are 
labeled as </>raw,j! '?^pSj) '/'pnj and 4>si,j, respectively. 
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When using the penahzed estimators, several values for the penalizing 
parameters r and p were chosen. Since large values of the smoothing pa- 
rameters make the penalizing term the dominant component regardless of 
the amount of contamination considered, we choose r and p equal to an~^ 
for a = 3 and 4 and a equal to 0.05, 0.10, 0.25, 0.5, 0.75, 1, 1.5 and 2. 

For the sieve estimators based on the Fourier basis, ordered as {1, cos('7rx), 
sin(7rx), . . . ,cos((7„7rx),sin(g„7r3;), . . .}, the values Pn = 2qn with qn = 5, 10 
and 15 are used, while for the sieve estimators based on the i?-splines, the 
dimension of the linear space considered is selected as pn = 10, 20 and 50. 
The basis for the i?-splines is generated from the R function cSplineDes, 
with the knots being equally spaced in the interval [—1, 1] and the number 
of knots equal to p„ + 1. To conserve space, we only report here the results 
corresponding to Pn = 30 and pn = 50 for the Fourier and i?-spline basis, 
respectively. More extensive simulation results are listed in the technical 
report [Bah et al. (2010)]. 

8.3. Simulation settings. The sample was generated using a finite Karhu- 
nen-Loeve expansion with the functions, (pi : [—1, 1] — > M, i = 1,2,3, where 
(j)i{x) = sin(47r2;), (j)2{x) = cos(77rx) and 03 (x) = cos(157rx). It is worth notic- 
ing that, when considering the sieve estimators based on the Fourier ba- 
sis, the third component cannot be detected when Qn < 15, since in this 
case 4>3{x) is orthogonal to the estimating space. Likewise, the second com- 
ponent cannot be detected when g„ < 7. 

We performed NR = 1,000 replications generating independent samples 
{^i}?=i of size n = 100 following the model Xi = Zucpi + Zi2(t)2 + -^is^Sj 
where Zj = {Zii,Zi2,Zi^)'^ are independent vectors whose distribution will 
depend on the situation to be considered. The central model, denoted Co, 
corresponds to Gaussian samples. We also consider four contaminations of 
the central model, labeled C2, Cs^a, C^^b and C23 depending on the compo- 
nents to be contaminated. Contamination models are commonly considered 
in robust statistics since they tend be the more difficult models to be robust 
against and are the basis for the concept of bias robustness, see Maronna, 
Martin and Yohai (2006) for further discussion. In all these situations Zn, Zj2 
and Zj3 are also independent. For each of the models, we took ui = 4, cr2 = 2 
and (T3 = 1. The central model and the contaminations can be described as 
follows: 

Co: Za^NiO,al), Zi2^N{0,ai) and Z.s -^ N{0,al). 

C2: Zi2 are independent and identically distributed as 0.8A^(0,o"|) + 
0.2Ar(10,0.01), while Zn ~iV(0,crf) and ^^3 ~ 7V(0,cr|). This contamina- 
tion corresponds to a strong contamination on the second component and 
changes the mean value of the generated data Zj2 and also the first principal 
component. Note that var(Zj2) = 19.202. 
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Csy. Za ~ N{0,al), Z^ ~ A^(0,a2) and Z,3 ~ 0.8iV(0,ai) + 0.2Af(15, 
0.01). This contamination corresponds to a strong contamination on the 
third component. Note that var(Zj3) = 36.802. 

C3,b: Zii ~iV(0,af), Z^a ~ iV(0,cj2) and Z^a ~ 0.8iV(0,ai) + 0.2iV(6,0.01). 
This contamination corresponds to a strong contamination on the third com- 
ponent. Note that var(Zj3) = 6.562. 

C23: Zij are independent and such that Zi\ ~ A^(0, o\)^ Z-a ~ 0.9A^(0, o\)^ 
0.1Ar(15,0.01) and ^^3 ~ 0.9iV(0,(T|) + 0.1 A^(20, 0.01). This contamination 
corresponds to a mild contamination on the last two components. Note that 
var(Zi2) = 23.851 and var(Zi3) = 36.901. 



We also include a long-tailed model, namely a Cauchy model, labe- 
led Ccauchyi which is defined by taking Z.j~C3(0, Xl) with I]= diag(crf ,o"2,o"3), 
where Cp(0, S) stands for the p-dimensional elliptical Cauchy distribution 
centered at with scatter matrix S. For this situation, the covariance op- 
erator does not exist, and thus the classical principal components are not 
defined. 

It is worth noting that the directions ^\ , (j)2 and (j)^, correspond to the clas- 
sical principal components for the case Co, but not necessarily for the other 
cases. For instance, when cr^ = VAR, C^^a interchanges the order between (/>! 
and i?!)3, that is, (f)^ = (?^R,i(C3_a) as defined in (3.1), and so it corresponds 
to the first principal component of the covariance operator, while (j)i is the 
second and <^2 is the third one. 

8.4. Simulation results. For each situation, we compute the estimators 
of the first three principal directions and the square distance between the 
true and the estimated direction (normalized to have L^ norm 1), that is. 



Note that all the estimators except those penalizing the norm, are such 
that ||(/'j|| = l. Tables 4 to 9 of the supplementary material [Bali et al. 
(2011b)] report the means of Dj over replications, which hereafter is re- 
ferred to as mean square error. To help understand the influence of the grid 
size m on the estimators. Tables 3, 4 and 5 give the mean squared errors for 
m = 50, 100, 150, 250 and 250, under Co for various values of the penalizing 
parameters. As can be seen, for the first two components some slight im- 
provement is observed when using m = 250 as opposed to ttt. = 50 points, but 
at the expense of increasing the computing time about 2.6 fold. On the other 
hand, for the third principal direction, taking m = 100 compared to m = 50 
reduces the mean square error by at least a half for the penalized estimators, 
while the gain is not so prominent for the raw estimates. The size ?n, = 50 
is selected for presentation in the remainder of our study since it provides 
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a reasonable a compromise between the performance of the estimators and 
the computational time. 

To help understand the effect of penalization, consider Table 6. This table 
shows results for the raw and penalized estimators under Cq for different 
choices of the penalizing parameters. From this table, we see that a better 
performance is achieved in most cases when a = 3 is used. To be more 
precise, the results in Table 6 show that the best choice for (?ips ,j is p = 2n~^ 
for all j. Note that p = 1.5n~^ gives fairly similar results when using the 
M-scale, reducing the mean squared error relative to the raw estimate by 
about a half and a third for j = 2 and 3, respectively. 

For the norm penalized estimators, </>pN,j5 the best choice for the penaliz- 
ing parameter seems to depend upon both the component to be estimated 
and the scale estimator used. For instance, when using the standard devia- 
tion, the best choice is r = O.lOn"^, for j = 1 and 2 while for j = 3 a smaller 
order is needed to obtain an improvement over the raw estimators, with the 
value T = 0.75n~^ leading to a small gain over the raw estimators. For the 
robust procedures, larger values are needed to see an advantage to using the 
penalized norm approach relative to the raw estimators. For example, for 
j = 1, the largest reduction is observed when r = 2n~^ while for j = 2, the 
best choices correspond to r = 0.5n~^ and r = 0.25n~^ when using the MAD 
and M-scale, respectively. When using the M-scale, a good compromise is to 
choose r = 0.75n~^, which gives a reduction of around 30% and 50% for the 
first and second principal directions, respectively, although smaller values 
of T are again better for estimating the third component. 

Based upon the above observations, we report here only the results corre- 
sponding to p = 1.5n~^ and r = 0.75n~^ for the penalized estimators cpp^j 
and (pPSjj respectively, under the contamination models and the Cauchy 
model. Results for other choices of p and r are given in Bali et al. (2010). 

The simulation study confirms the expected inadequate behavior of the 
classical estimators in the presence of outliers. Under contamination, the 
classical estimators of the principal directions do not estimate the target 
directions very accurately. This is also the case when considering the Cauchy 
distribution. Curiously, though, the principal directions, under the Cauchy 
model, do not seem to be totally arbitrary and they partially recover (pi , 02 
and (j)3 when the standard deviation is used, although not as well as when 
using a robust scale, even though the covariance operator does not exist, nor 
do the population principal directions as defined in (2.1). 

The robust estimators of the first principal directions are not heavily af- 
fected by any of the contaminations, while the estimates of the second and 
third principal directions appear to be most affected under model C^^a- In 
particular, for the third direction, the projection-pursuit estimators based 
on an the MAD seems to be most affected by this type of contamination 
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when penalizing the norm, although much less so than the classical meth- 
ods. With respect to the contamination model Cs^a, the estimators (j)pN,j, 
which are the robust penalized norm estimators, tend to have the best per- 
formance among all the robust competitors for the first two components, 
and, in particular, when using the M-scale; see Table 8. It is worth noting 
that the classical estimators of the first component are not affected by this 
contamination when penalizing the norm since the penalization dominates 
the contaminated variances. The same phenomena is observed under C-s^b 
when using the classical estimators for the selected amount of penalization. 
For the raw estimators, the sensitivity of the classical estimators under this 
contamination can be observed in Table 7. We refer to Bali et al. (2010) for 
the behavior when other values of the smoothing parameters are chosen. 

As noted in Silverman (1996), for the classical estimators, some degree of 
smoothing in the procedure based on penalizing the norm will give a better 
estimation of (pj in the L^ sense under mild conditions. In particular, both 
the procedure penalizing the norm and the scale provide some improvement 
with respect to the raw estimators if ^{(j)j) < ^{(pe), when j < i. This means 
that the principal directions are rougher as the eigenvalues decrease [see 
Pezzulli and Silverman (1993) and Silverman (1996)], which is also reflected 
in our simulation study. The advantages of the smooth projection pursuit 
procedures are most striking when estimating 02 and (ps with an M-scale 
and using the penalized scale approach. 

As expected, when using the sieve estimators, the Fourier basis gives 
the best performance over all the methods under Co, since our data were 
generated from this basis; see Table 9. The choice of the i?-spline basis 
give similar results quite to those obtained with (pps,j when estimating the 
flrst direction, except under Ccauchy where the penalized estimators show 
a better performance. For the second and third components, the estimators 
obtained with the 5-spline basis show larger the mean square errors than 
the raw or penalized estimators. 

8.5. Kth fold simulation. Table 1 reports the computing times in min- 
utes for 1,000 replications and for a fixed value of r or p, run on a computer 
Core Quad 17 930 (2.80 GHz) with 8 Gb of Ram memory. We also report 
the computing times when using the sieve approach with the Fourier basis 
and a fixed value of pn ■ This suggests that the leave-one-out cross-validation 
may be difficult to perform, and so a K-io\d approach is adopted instead. 
It is worth noticing that the robust procedures based on the mad are much 
faster than those based on the M-scale, so they may be preferred in terms of 
computing time. However, as mentioned in Section 2, the main disadvantage 
of the MAD is its low efficiency and lack of smoothness, which is related to the 
discontinuity of its infiuence function. This is particularly important when 
estimating principal components in the finite-dimensional setting, since as 
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Table 1 

Computing times m m,inutes for 1,000 replications and a fixed 

value of T, p or pn (when using the Fourier basis) 





sd 


mad 


M-scale 


0H,AW 


5.62 


6.98 


17.56 


0PS 


7.75 


9.00 


20.18 


0PN 


31.87 


33.21 


44.04 


0SI 


0.5 


5.22 


16.08 



it was pointed out by Cui, He and Ng (2003) and Croux and Ruiz-Gazen 
(2005) the variances of some elements of the estimated principal directions 
may blow up when using the mad leading to highly inefficient estimators. 
As expected and mentioned in Section 8.4, Table 6 reveals a high loss of effi- 
ciency for the MAD, in our functional setting, for any choice of the smoothing 
parameter. 

For the procedure which penalizes the scale or the norm, the smooth- 
ing parameters p and r are selected using the procedure described in Sec- 
tion 7 with K = 4 and £ = 1. Due to the extensive computing time, we 
have only performed 500 replications. The simulation results when penal- 
izing the scale function, that is, for the estimators defined through (3.6), 
are reported in Table 2. Under Co, when estimating the second and third 
principal directions, the robust estimators based on the M-scale combined 
with a penalization in the scale clearly have smaller mean square error than 
the raw estimators, while those penalizing the norm improve the perfor- 
mance of the raw estimators and also that of 4>ps,j7 on the first and second 
directions. 

From the results in Table 2 we observe that the classical estimators 
are sensitive to the contaminations in the simulation settings, and, except 
for contaminations in the third component, the robust counterpart shows 
a clear advantage. Note that C-s^b affects more the classical estimators when 
the smoothing parameter is selected by the robust A'-fold cross-validation 
method than for the fixed values studied in the previous section. This can 
be explained by the fact that contamination C^^b is a mild contamination in 
the third component which has a large 1103 |P, and so the classical estima- 
tors are more sensitive to it, just as the raw estimators, if smaller values of 
the smoothing parameter are chosen. It is worth noticing that the penalized 
robust estimators based on the M-scale improve the performance of the raw 
estimators based on the M-scale, even under contaminations, when the pe- 
nalizing parameter is selected using the K-fold approach. This advantage is 
more striking when penalizing the norm and when the two first principal 
components are considered. 
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Table 2 
Mean values of ||<^j/||<7!>j|| — 4>j\^ when the penalizing parameter is selected using K-fold 

cross-validation 





Scale estimator 




4>ps,j 






</'PN,j 




Model 


i = i 


j = 2 


i = 3 


j = l 


j = 2 


i = 3 


Co 


SD 


0.0073 


0.0094 


0.0078 


0.0075 


0.0094 


0.0360 




MAD 


0.0662 


0.0993 


0.0634 


0.0497 


0.0660 


0.2573 




M-scale 


0.0225 


0.0311 


0.0172 


0.0208 


0.0271 


0.0839 


C2 


SD 


1.2840 


1.2837 


0.0043 


1.2076 


1.2232 


0.0301 




MAD 


0.3731 


0.3915 


0.0504 


0.3360 


0.3770 


0.2832 




M-scale 


0.4261 


0.4286 


0.0153 


0.3679 


0.4049 


0.1607 


Cs,a 


SD 


1.7840 


1.8901 


1.9122 


1.7795 


1.8861 


1.9134 




MAD 


0.2271 


0.5227 


0.5450 


0.0573 


0.2289 


0.9540 




M-scale 


0.2176 


0.4873 


0.5437 


0.0257 


0.1187 


0.8710 


Cs^h 


SD 


0.0192 


0.8350 


0.8525 


0.0173 


0.5902 


0.7502 




MAD 


0.0986 


0.3930 


0.3820 


0.0553 


0.1417 


0.5167 




M-scale 


0.0404 


0.2251 


0.2285 


0.0241 


0.1080 


0.3174 


C23 


SD 


1.7645 


0.5438 


1.6380 


1.7537 


0.6496 


1.4305 




MAD 


0.2407 


0.3443 


0.2064 


0.1414 


0.2214 


0.6824 




M-scale 


0.2613 


0.3707 


0.2174 


0.1313 


0.1870 


0.5901 


^Cauchy 


SD 


0.3580 


0.4835 


0.2287 


0.2862 


0.3525 


0.3435 




MAD 


0.0788 


0.1511 


0.1082 


0.0613 


0.0855 


0.3147 




M-scale 


0.0444 


0.0707 


0.0434 


0.0349 


0.0463 


0.1465 



Note that we choose ^ = 1, and so our focus was on the first principal com- 
ponent. To improve the observed performance, a different approach should 
be considered, maybe by selecting a different smoothing parameter for each 
principal direction. 



9. Concluding remarks. In this paper, we propose robust principal com- 
ponent analysis for functional data based on a projection-pursuit approach. 
The different procedures correspond to robust versions of the unsmoothed 
principal component estimators, to the estimators obtained penalizing the 
scale and to those obtained by penalizing the norm. A sieve approach based 
on approximating the elements of the unit ball by elements over finite- 
dimensional spaces is also considered. In particular, the procedures based 
on smoothing and sieves are new. A robust cross-validation procedure is 
introduced to select the smoothing parameters. Consistency results are de- 
rived for the four type of estimators. Finally, a simulation study confirms 
the expected inadequate behavior of the classical estimators in the presence 
of outliers, with the robust procedures performing significantly better. In 
particular, the procedure based on an M-scale combined with a penaliza- 
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tion in the norm, where the smoothing parameter is selected via a robust 
K-fold cross-validation, is recommended. 

Among other contributions we highlight the following: 

(a) We obtain the continuity of the principal directions and eigenvalue 
functionals, which implies the asymptotic qualitative robustness of the cor- 
responding estimators. This extends the results of Li and Chen (1985) from 
Euclidean spaces to infinite-dimensional Hilbert spaces, where the unit ball 
is not compact. Noncompactness poses technical challenges which we over- 
come with tools from functional analysis. 

(b) Our results not only include the finite-dimensional case but also im- 
prove upon some of the results obtained in that situation for the projection 
pursuit estimators. For example, the assumptions in Li and Chen (1985) 
regarding the robust scale functional are stronger than ours. Also, to derive 
the consistency of the raw estimators, we only require uniform convergence 
over the unit ball of Sn{ct) to cr(a), which holds if the scale functional o"r 
is continuous. This improves upon the consistency results given in Cui, He 
and Ng (2003), who require a uniform Bahadur expansion for Sn(o) over the 
unit ball. 

(c) A key step in proving the continuity of the projection pursuit func- 
tional is to show that weak convergence of probability measures over a Hilbert 
space implies uniform convergence of the laws of the projections of the 
stochastic processes, that is. Theorem 6.2. This uniform convergence re- 
sult can be useful in other statistical problems where projection methods 
are considered. 

(d) The proofs for the penalized estimators include, as particular cases, 
the estimators defined by Rice and Silverman (1991) and studied by Pezzulli 
and Silverman (1993), and those considered by Silverman (1996). Extending 
the results to scale estimators other than the standard deviation required 
more challenging arguments since, unlike the classical setting, the projection- 
pursuit index sf^{a) cannot be expressed in the simple form (a, F^q) for some 
compact operator F„. 

APPENDIX 

In this Appendix, we give the proofs of the results stated in Section 4. 
Some technicalities are omitted, and we refer to the technical report [Bali 
et al. (2010)] for details. Before presenting the proof, some additional nota- 
tion is needed. 

Denote by Cm^i the linear space spanned by {(J)r,i, ■ ■ ■ ,(pR,m-i}i and 
let jCm~i be the linear space spanned by the first m — 1 estimated principal 

directions, that is, by {(/>si,i5 • • • ></'sl,m-i} or {cpi,. . . ,(pm-i}, where it will 
be clear in each case which linear space we are considering. The latter in- 
cludes the situation of the linear spaces spanned by {</>raw,1) ■ ■ ■ , </'RAW,m-i}) 
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{(pps,i, ■■■, 0ps,m-i} and {4>pN,i, • • • , ^PN,m-i}- Finally, for any linear space C, 
TTc'-'H —^ C stands for the orthogonal projection onto the linear space C, 
which exists if £ is a closed linear space. In particular, ttCj^^i , tt? and vr-^p„ 
are well defined. 

Moreover, for the sake of simplicity, denote by 7fc = /3^ the linear space 
orthogonal to (pi,. . . ,(j)k and by TTfc = ttt^ the orthogonal projection with 
respect to the inner product defined in Ti. On the other hand, let TTr^k 
be the projection onto the linear space orthogonal to <j)i,. . . ,(j)k in the 
space 7is in the inner product (•,-)t, that is, for any a G ^^S) T^T,k{<^) = 
a — J2j=i{^j 4'j)t4'j- Moreover, let Tr,k stand for the linear space orthogonal 
to Ck with the inner product (•, •),-. Thus, TTr^k is the orthogonal projection 
onto TT,k with respect to this inner product. 

Proof of Theorem 4.1. First note that the fact that aji is a scale 
functional entails that a"n(a) = ||a||cr„(a/||a||). Thus from assumption (ii) 
and the fact that ||a|| < ||a||T) we get that 

(A.l) sup |(T,„(q;)— o" (a)|^-4 and sup |(T,„(a)— cr (a)|— ^0. 

||a||<l l|a||T<l 

(a) To prove that Ai —-> o''^{(t>R,i) it is enough to show that 

(A.2) ^^(0R,l)>Ai + Oa.s.(l), 

(A.3) ^^(0R,l)<Ai + Oa.s.(l), 

where Oa.s. (1) stands for a term converging to almost surely. 

Note that from (A.l), we get that a^,! = cr^(^i) — cr'^{4>i) ^-4 and bn,i = 
Cni'pR,i) ~ ^^(</'R,i) ~~^ 0- Using that o" is a scale functional and that 
o"^(</'R,i) = supQ,g5(j^(a), we obtain easily that 

V||0i||/ \m\r 

concluding the proof of (A.2). 

To derive (A.3), note that since c^r^i G Tis, |1<?^r,i|1t < oo and ||i;AR,i|lr > 
||0R,i|| = 1, then, defining /3i = (^R,i/||(/'R,i||r, we have that ||/3i||r = 1, which 

implies that Ai = (7^(0i) > '7^(01 ) — P^{(t'i) > <7^(/3i) — P^(/3i)- Hence, using 
that (7r is a scale functional and that ^(aa) = a'^'^{a), for any a G M, we 
get 

'^n(<^R,l) - P*(<^R,l) ^^{(^Ra) + fen,l - P^(0R,l) 



Ai><(/3i)-p^(/3i) 



^R,l|lr ll0R,l|l? 



When p = 0, we have defined p^((/)R^i) = and similarly when r = 0. So 
from now on, we will assume that r„ > and p„ > 0. Since bni = Oa.s. (1), 



28 BALI, BOENTE, TYLER AND WANG 

p ^-4- and r ^-4- 0, we have that /?^((/)r_i) ^-4-0 and ||0r,i||t —-> ||0R,i| 



a.s. 



(A.5) ^2(^^/||^^||)J-^2( 



1, concluding the proof of (A.S). Hence, Ai ^-^ a (0r^i). 

Prom (A.l) and the fact that \\4>i\\ < 1, we obtain that Ai — a'^{(t)i) = 
^ni4'i) ~ <^^('/'i) ~~^ 0- Therefore, using that Ai ^-4- cr'^{(t>R^i), we get that 

(A.4) ^2(^^)J-^2(^^^^)_ 

Moreover, the inequahties cr'^{4>R^i) > o"^((?l)i/||0i||) > (t^(<^i) obtained above 
also imply that 

'R,l)- 
Using that ||^i||r = 1, we get that r^(^i) = 1 - ||(^i|p = 1 - a'^ ($1) / a'^ (cpi / 
ll^ill). Hence, (A.4) and (A.S) entail that r^(^i) -^0. 

It only remains to show that p^{<j)i) ^-^ 0, which follows easily from the 
fact that Ai -^(t2(0r,i), (7l{(j)R^i) ^ a'^ {(J)k^i) , p^O and \\(l)R,i\\r ^ I 
since Ai > (T^(^i) - p[^i,^i] > (o-^((^r^i) - p[(/>R^i,(?iR^i])/||0R^i||2. 

Note that we have not used the weak continuity of cr as a function of a 
to derive (a). 

(b) Note that since ||(/'i||t = 1; we have that \\(j)i\\ < 1. Moreover, from (a), 
\\(j)i\\ ^-4 1. Let (pi = (pi / \\(j)i\\ , then (pi & S and (7{(pi) = a{(pi)/\\(pi\\. Us- 
ing that (t'^{(Pi) ^-4 (T^((/)r^i) and \\(pi\\ —^ 1, we obtain that 0"^((/)i) ^-4 
c^('/'R,i)i and thus the proof follows using Lemma 4.1. 

(c) Let us show that Am ^-4 cr'^{(pR,m)- The proof will be done in several 
steps by showing 

(A.6) sup \a'^{TTm-ia)-(T'i(jfr,m-ia)\'^0, 

l|a|k<i 

(A.7) a2(0R,m)>Am + Oa.s.(l), 

(A.8) fT^(0R,m)<Am + Oa.s.(l). 

Note that (A.6) corresponds to an extension of assumption (ii) while (A.7) 
and (A.S) are analogous to (A. 2) and (A.S). 

We begin by proving (A.6). Note that sup||Q,|i <i|o"^(7rm-ia) — o'^(vrr,m-ia)| < 

SUP||a||^<l|o-2(vrm-ia)-0"^(vrr,m-ia)|+SUP||„||^<l|(T^(7f^,rn-ia)-Cr^(vrr,m-ia)|- 

Using (A.l) and the fact that if ||a||r < 1, then ||7rT-,m-i"||r < 1, we get that 
the second term on the right-hand side converges to almost surely. To 
complete the proof of (A.6), it remains to show that 

(A.9) sup |o-^(7rm„ia) -cr^(7f^,m._ia)| -^0. 

\\a\\-r<l 

Using that (pj ^-4 (pRj and that T^{(pj) = T\(pj,(pj~\ ^-^ 0, for 1 < j < m. — 
1 and arguing as in Silverman (1996) [see Bali et al. (2010) for details], 
we get that, for 1 < j < m - 1, supii^ii^i\\{a, (pRj)(pRj - {a,(pj)r(pj\\ -^0, 
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a.s. 



entailing that sup||Q,|i <]^||7rT-^m-ia — TTm-iall ^-^0. Therefore, using that a 
is weakly uniformly continuous over the unit ball, we get easily that (A. 9) 
holds, concluding the proof of (A. 6). 

As in (a), we will next show that (A. 7) holds. Using again that cj is a scale 
functional, we get easily that sup^g^p-y^.^ a'^[a) = snp^^g a^ (TTm-ia) , so us- 
ing again that |[<^m|| < ll'^mllr = 1, we obtain that cr'^{<pR,m) = 
sup^^sa'^i'iTm-ia) > a'^{TTm-i(j)m/\\(pm\\) > cr'^ {iTm-ij)m) ■ From (A.6)jmdthe 

fact that ||(^m||r = 1, we get ^hat bm =JT'^{'Km-l4>m) - Crl{TTr,m-l(pm) ^ 

0, and so since 7rr,m-i4>m = 4>m and ||^m|| < 1, we get that a'^{(j)R,m) > 

a'^{7rm-i(pm) = alinr^rn^-icpm) + Oa.s.(l) = ^m + Oa.s.(l)> Completing the proof 
of (A.7). 

We will show now (A.S). Note that (l)B.,m £ ^^s, so that ||(/>R,m||T < oo 
and ||(/'R,m||r — >■ ||</'R,m|| = 1- Usiug that cr is a scale functional, the fact 
that Xm = o-l((i>m) >o-^(0„,)-/)^(0„,) =sup||^||^^^^^g;^^^_Jcr2(Q)-p^(a)} 
and that for any a G Tis such that |[a||T- = 1 we have that ||7rT-^m-io||r ^ Ij 
we get easily that A™ > sup||(^||^=i{a^(7r^„,_iQ) - p^{TTr,m-ia)}, and so 

Am > (o-^(vrr,m-i0R,m) " /o'I'(vrr,-m-i0R,m))/||0R,m ||r • From (A.6) we obtain 
that dm = <7n(^T,m-i4'R,m) — cr'^ {'^m-i4'R,m) —-^ 0. Moreover, the fact that 
r ^-4 entails that ||i;AR,m||T —-> \\4'R,m\\ = 1- On the other hand, using 
that p^[(j)() ^-4 0, 1 < i < m — 1, and the fact that p —^ implies that 
p^((^R^m) = Oa.s.(l), analogous arguments to those considered in Pezzulli and 
Silverman (1993) allow us to show that p^(7fm-i0R,m) = 
p\Tfm-i(l)R,m,Tfm-i4>R,m] "^0. Hence, we get that 

O-^(7rm_i0R,m) +dm- P^(vfr,m-10R,m.) 



Am ^ 



> 



l + o(l) 

O"^(vrm-10R,m) + dm - Oa.s.(l) 



l + o(l) 

= 0-^('^R,m)+Oa.s.(l); 

where the last equality follows from the fact that vTm-i^R,™, = 0R,m.- 
Therefore, from (A.7) and (A.S), we obtain that Am —-> cr^(0R,m.)- 
On the other hand, (A.6) entails that \m — '^'^{4>m) = o'^i^'m) — (^'^ {4'm) —-^ 

0, which together with Am —-> cr'^{<pR,m) implies that (j'^{4>m) —-^ cy"^ {<t>R,m) ■ 
To complete the proof of (c), it remains to show that T'^{(j)m) ——^ and 

p^((/>m) ——^ 0. As in (a), we have that the following inequalities converge to 

equalities: 

(A.IO) Cr'^{(t>R,m) > O-^f TTm-l ,,^"'„ ) > O-^(vrm-l^m) = Am + Oa.s.(l)- 
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Using that cr is a scale estimator and that H^mllr = 1, we get that T^{(j)m) = 
l-||^m|P = l-o-^(vrm_i^m)/o-2(7rm_i^m/||0m||), which together with (A.IO) 
entails that the second term on the right-hand side is 1 + Oa.s.(l) ^^'^ so, 
T\(l)m,4>m] ^ 0, entailing that ||^„J ^ 1. 
On the other hand, we also have that 

(A.ll) Xrn = Crli^m) > (T^(^m) - P^(0m) > (^^ {(pR,m) + Oa.s.(l)> 

SO using that Am = a'^{(j)m) —-> cr'^{4'R,m), we obtain that p"^{4'rn) —-> 0, 
concluding the proof of (c) . 

(d) We have already proved that when m = l the result holds. We proceed 
by induction and assume that {(pi, cj)^/)'^ — )■ 1, T^{(j)i) ^-4 and p^((/>f ) ^-4- 
for 1 < £ < ?7i — 1, to show that {4'm,4'R,m)^ — ^ 1- Without loss of generality, 
we can assume that (pe ——> (pR/j fo^^ 1 <£ <m— 1. Denote by (j)j = (l)j/\\(j)j\\. 
Then, for 1 < i <m — 1, {{(piW -^ 1, and so (pi ——^ (pR,e- It suffices to show 

that ((^R,m,0m)^ ^1. 

Using (c) we have that a'^{(pm) —-> cr'^ {4'R,m) and that \\(j)m\\ —-> 1, and 
so cr'^{(pm) ——> cr'^{4'R,m)- The proof follows now from Lemma 4.1 if we show 
that {^m,(pe) ^0,l<i<m-l. 

Using that T^{(pi) —^ 0, for 1 <i<m—l, and that from (c) T'^{4>m) ——^ 
we get that T\(pi,(prn'] ——^ for 1 < i <m — 1. Therefore, the fact that 
(<Pm.,'h)r = entails that {(pm,(pe) = (<Pm,<Pe)T - r\(pe,(pm] ^ 0, and so 
{(pm,4'e) ~~^ Oi concluding the proof. D 
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SUPPLEMENTARY MATERIAL 

Supplement A: Robust functional principal components 

(DOL 10.1214/11-AOS923SUPPA; .pdf). In this Supplement, we give the 
proof of some of the results stated in Sections 4 and 6. 

Supplement B: Robust functional principal components 

(DOI: 10.1214/11-AOS923SUPPB; .pdf). In this Supplement, we report the 
results obtained in the Monte Carlo study for the raw estimators and for 
the penalized ones when the smoothing parameters are fixed. 
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